Estimating Cache Performance for Sequential and Data Parallel Programs
نویسنده
چکیده
This paper introduces an analytical model that enables automatic estimation of the cache performance for both sequential and data parallel Fortran programs. The estimation is based on a classiication of array accesses with respect to cache reuse at the source code level. An estimated upper bound of the number of distinct cache lines accessed inside of a loop is statically computed. Based on this estimate the number of cache misses for loops, procedures and the entire program can be predicted. The method has been implemented as part of P 3 T (Parameter based Performance Prediction Tool) and successfully supports VFCS (Vienna Fortran Compilation System) in guiding the application of data distributions and program transformations on distributed memory multiprocessor systems to achieve greater cache eeectiveness. Experiments are presented that demonstrate the eecacy of our approach with very encouraging experimental results.
منابع مشابه
A Preliminary Evaluation of Cache-miss-initiated Prefetching Techniques in Scalable Multiprocessors
Prefetching is an important technique for reducing the average latency of memory accesses in scalable cache-coherent multiprocessors. Aggressive prefetching can signiicantly reduce the number of cache misses, but may introduce bursty network and memory traac, and increase data sharing and cache pollution. Given that we anticipate enormous increases in both network bandwidth and latency, we exam...
متن کاملCoherence Miss Classification for Performance Debugging in Multi-Core Processors
Multi-core processors offer large performance potential for parallel applications, but writing these applications is notoriously difficult. Tuning a parallel application to achieve scalability, referred to as performance debugging, is often more challenging for programmers than conventional debugging for correctness. Parallel programs have several performance related issues that are not seen in...
متن کاملHardware Support for Data Dependence Speculation in Distributed Shared-Memory Multiprocessors Via Cache-block Reconciliation
Data dependence speculation allows a compiler to relax the constraint of data-independence to issue tasks in parallel, increasing the potential for automatic extraction of parallelism from sequential programs. This paper proposes hardware mechanisms to support a data-dependence speculative distributed shared-memory (DDSM) architecture that enable speculative parallelization of programs with irr...
متن کاملDCompose: A Tool for Measuring Data Decomposition on Distributed Memory Multiprocessors
In converting sequential programs for execution on distributed memory parallel processors, the programmer must determine the optimal data decomposition for the data structures. This task is an extremely complex optimisation problem and thus is usually performed manually. This chapter describes an X based visualisation tool called DCompose, which allows a programmer to measure the efficiency of ...
متن کاملP3T: An Automatic Performance Estimator for Parallel Programs
The area of parallelizing compilers for distributed memory multicomputers has seen considerable research activity during the last few years. Most of the current compilers do not provide any support for estimating performance impacts of code changes that they apply. In this paper, we present P 3 T, which is a static and automatic performance estimator for data parallel programs. It computes at c...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997